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ABSTRACT 


We typically think of fitting data with an approximating curve in the linear 
least squares sense, where the sum of the residuals in the vertical, or y, direction is 
minimized. The problem addressed here is to fit a Bézier curve to an ordered set 
of data in the total least squares sense, where the sum of the residuals in both the 
horizontal and vertical directions is minimized. More exact: given an ordered set of m 
data points d;, 2 = 1,2,...,m find a set of control points b;, 7 = 0,1,...,n where n is 
the order of the Bézier curve, and a vector t of nodes, 0 < ¢; < te <--- <t, <1 that 
minimize || B(t)P — D ||p. The matrix D contains the data points, the matrix P 
contains the control points, and the matrix B(t) is a Bernstein matrix. The algorithm 
to accomplish this is explained in detail and makes extensive use of the linear algebra 


representation of Bézier curves. 
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I. INTRODUCTION 


Notation used in this paper is as follows: matrices are slanted upper case 
letters, vectors are lower case bold letters, and scalars are slanted lower case letters. 
Also, vectors are column arrays unless otherwise indicated. For ease of discussion, a 


point and a vector are equivalent. 


A. BERNSTEIN POLYNOMIALS 


Bernstein polynomials have the following general form: 


OP (a 


where t € [a,)]. 
We will only consider the case where a = 0 and 6 = 1. Equation (1.1) now 


takes on the form; 


Br(t) = (") (ty 4), f= 0, 1 cscs (1.2) 


which is a scalar for a particular ¢ € [0,1]. From Equation (1.2) note that BO(¢) = 1. 
Also, we define B?(t) = 0 if 7 <O org >n. 

The set of Bernstein polynomials of degree n form a basis for P,, the space of 
polynomials of degree n or less. The linear transformation from power basis coefh- 
cients to Bernstein basis coefficients is explained in detail in |Ref. 1]. For a complete 


discussion on the properties of Bernstein polynomials see [Ref. 2]. 


B. BEZIER CURVES 


Bézier curves are named after P. Bézier and are used extensively in computer 
aided geometric design. Before presenting the general form for a degree n Bezier 
curve, let us look at an example. 

Consider two points on the x-axis given by by = (2,0) and b; = (4,0), and 


suppose that we want to describe a degree 1 curve between these two points. A 





parametric representation of this curve is; 
a(t)=2+ 2t, y(t)=0, ¢€ [0,]] 


Note that since B}(t) for i = 0,1 is a basis for P,, we can write z(t) and y(t) in terms 


of Bernstein polynomials. Using matrices, we have; 


[x(z), y(t) ] [2+ 2, 0] 


(2(1 — t) + 4t, 0(1 —t) + 08} 


2 0 
= |B) BO] 


The parametric representation of a curve in the form of Equation (1.3) is of central 

importance in this paper. Also, all examples in this paper are in 2-dimensions, but 

it is understood that this representation holds for higher dimensional space as well. 
In general, a Bézier curve of degree n, denoted b5(¢), can be written as 


S> BP(t)b? (14) 


1=0 


B2(t)bs + B?(t)b? +...4+ Br(t)b? 


| 


(bg (t))* 


where bG(t) is a point on the curve for a particular t € [0,1]. The points b;,z = 0,...,n 
are called control points. We see that a point on a Bézier curve is a weighted sum of 
control points, where the weights are Bernstein polynomials evaluated at a particular 
value of f. 

Let us look at an example of a cubic Bézier curve before discussing properties 
of these curves. Figure (1) on page 9 shows four control points and a curve starting 
at control point bp and ending at control point b3. Note that we need n+ 1 control 
points for a degree n curve. Also note that, in this example, the curve does not pass 


through b, or by. 





Figure (2) on page 9 shows how the curve changes when the positions of control 
points b; and by are interchanged. This is why Bézier curves are so widely used in 
computed aided geometric design; by changing the position of one or more control 


points the user can easily change the shape of the curve. 


C. PROPERTIES OF BEZIER CURVES 


The following properties are basic to understanding Bézier curves and provide 


the necessary background for discussion later in this paper. 


1. Endpoint Interpolation 
Bézier curves interpolate between the and control points bp and b,. We saw 
in Figure (1) that the Bézier curve had bo as one-endpoint and bg as the other. This 
is always the case and is shown using Equation (1.4) with t = 0 and ¢t = 1. Because 
B?(0) = 0 except for 1 = 0, and B?(1) = 0 except for 1 = n, we have; 
r eS aro - 
i=0 


and 


(on 1))? => Br(1)bT = 


7=0 
Though Bézier curves are guaranteed to begin at bp and end at by, it is not guaranteed 


that they pass through any of the intermediate control points by, b2,..., br-1. 


2. Affine Invariance 
An affine transformation is of the form 6(p(t)) = Ap(t) +c. Examples of 
affine transformations are scaling, rotation, and reflection. The equation describing 


Bézier curves is affine invariant in that given 


(b2(t))7 = 57 Br(t) t)b? 


2=0 
under the transformation ® we will have 
b(( =o B?(t)®(b; ) 
1=0 





In other words, given an affine transformation ®, performing the following two steps 
produces equivalent results. 
1. Evaluate Equation (1.4) for a given set of control points and a value ¢ to 
obtain (b2(t))7, and then apply © to (bg(t))’. 
2. Apply © to each of the control points and then evaluate Equation (1.4) 
using the transformed control points and the same value ¢ as in 1. 
For example, suppose we want to rotate a given Bézier curve. We can either transform 
the relatively few control points or transform all the points of the curve. In general, 
transforming the control points and re-plotting the curve costs much less. Other 


properties of the Bézier curve are discussed in detail in [Ref. 2]. 


D. MATRIX REPRESENTATION OF BEZIER CURVES 
In this section we represent Bézier curves in linear algebra form. This paper 
makes extensive use of this representation for Bézier curves to simplify their manip- 


ulation. 


1. The Bernstein Matrix 
Equation (J.4) showed that a Bézier curve is described by 


(bp ())” = 0 BR(t)b;, te [0,1] 
7=0 
This is equivalent to the linear algebra form: 
bo 
(bo(t))” = [Bo(é) ... Br] ] : | = BO)P (1.5) 


where we will refer to B(t) as a Bernstein matriz. A Bernstein matrix is a generalized 


Vandermonde matrix which, for a vector t = [t),t2,...,tm]’, t; € [0,1] and a given 








degree n, has the form; 


Bolt) Bit) --- Bett) 

Bo(te) Bi(te) +--+ Br(te) 
B(t) = 

Bo(tm) Bi(tm) ++ Bu(tm) 


Therefore, we see that for t € R™ and a given degree n we have B(t) € R™*(@+)), 
In 2-dimension, the matrix P contains the z and y coordinates of the control 


points and has the following form: 


Ln Yn 


For example, if instead of evaluating Equation (1.5) for a particular t we are 
interested in getting m = 3 points on a Bézier curve of degree n = 3 we would 


evaluate; 


(bp (t))" = B(t) P 


where B(t) € R®** and P € R***. Therefore, (bg(t))? € R°*? and is a matrix of 
points on the curve. 

As another example, let n = 3 and only consider a particular t. We will expand 
the elements of B(t) to further demonstrate the inherent linear algebra form of Bézier 


curve representation. Note that we can write 


Be (t) —t? + 3t? — 3t4+1 
BU)? = B3(t) - 3? — 62? + 3¢ 
7 B3(t) - —3t? + 3t? 
B3(t) i 





and, therefore, we see that 


—-l 3 -3 1 i 
3 -6 3 0 t? 
B(t)? = = ae ie 
-3 3 0 0 t 
1 0 0 0 1 
where T' € R!*4 for the array. Equivalently, we have; 
B(t)=TM (1.6) 


If we now consider a vector t € R™, then instead of T € R!** we would have 
T ER™*4 . For a given matrix of control points we get the following equation which 


describes m points on a degree 3 Bézier curve. 


3 a —] 3 —3 ] Zo Yo 
tet? 6 6ty 1 : oe & Ss 
= Ly Vi 
(bg(t))? = 
ae -3 3 0 O| | xz w 
a 1 0 00 @3 Y3 
- TMP 
= B(t)P 


2. Slope of a Bézier Curve 
Now that we have an equation for a degree n Bézier curve, we will derive the 
equation for the first derivative at a particular ¢, which in turn gives the slope at a 


point on the curve. Since Bézier curves are parametric in t, we want to solve 


d n ‘ : n T 
F(be())” = SBOP: 


From [Ref. 2: page 46] we have 


© BR) =n BRy (t)—n BP-'(t) 








and, therefore, 
d Te ~ Tom nu 
S(bs(t))? = Ye Bry -n Br ypt 
2=0 


(nO — BY-*(t)])b2 + (n[BE-"(t) — BE-*(#)])b? +... 
+(n[BRz}(t) — 0))b2 


After combining like terms of n B?~'(t) we have, finally 


n Be-*(t)(b; — bg) +n BP (t)(by — bi) +... 


+ 
iii 
oO3 
on 

oo. 
— 
— 

~ 
I 


+n Braz (t)(b; — by-1) 
= n>) BP“ (t)(bi,, — b) 
= n>. Br l(t) Ab; (1.7) 


where A is the forward difference operator. 


Equation (1.7) lends itself to representation in linear algebra form as follows; 


bo 
Cage Sa aes i by 
qibalt))? = n[ Bo") BM) Baw A] 
| ee 
= nB(t)AP 
where A € R”*("+1) and has the form 
—l1 1 0 0 0 
0-1 1 O 0 
0 0-1 1 
For example, consider the case where n = 3. Since we have that 
B2(t) #2 — 241 1-2 1] | # 
(BQ)? = | Bat) | = | -24+2¢ |=} -2 2 of | 2 
B3(t) i? 1 0 0 1 








we can write 


bg 

| =o 4 -1 1 0 0 
< (pat)? =3[2 t | —2 2 0 0-1 1 0{| 
di. 07) 7 7 bZ 
1 0 0 co at 4 : 
b; 


We can obtain the first derivative at m points along a Bézier curve by evaluating 


Equation (I.7) for a given vector t € R™ and t; € [0,1]. For example, when n = 3 


we have; 
i? 4. 7 bi, 
ee 4 1 —2 1 —] 1 0 0 pt 

d t 2 

=, (bi(t))” = 3 a" "Tl 9 9 9 0-1 1 0 
- a = b? 
] 0 Q 0 0 -l1 1] - 
cae 4 b3 


There are similar equations for higher derivatives, for these see [Ref. 2]. 

















Figure 1. Cubic Bézier Curve with Control Points 





0 1 2 3 4 5 6 


Figure 2. Cubic Bézier Curve with Reordered Control Points 
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Ii. PROBLEM STATEMENT 


This chapter presents the problem of fitting a given ordered set of data with 
a Bézier curve in the total least squares sense. We will see how the linear algebra 
representation of Bézier curves lends itself to solving this problem. We refer to the 
point along a Bézier curve determined by a particular node t; as the point T;. So, 
from Equation (1.4) we have 


Ti = bo (ti) 


A. PROBLEM STATEMENT | 
The problem to solve is stated as follows: given an ordered set of m data 

points d;, 27 = 1,2,...,m find a set of control points b;, 7 = 0,1,...,n, where n is 

the order of the Bézier curve, and a vector t of nodes, 0 < 4; < te < ++: < ty <1 


that minimize 


|| B(t)P — Dlr (11.1) 


where the matrix D contains the data points. Though neither the matrix P* nor 
the vector t* that minimize Equation (II.1) are unique, and though in practice the 
resulting value of Equation (II.1) is determined in part by the stopping criteria of 
the algorithm, for ease of reference we will equate minimizing Equation (II.1) with 
determining {P*,t*}. Also, we will only consider the case where n < m since for 
n > m we could produce a curve which passes through all the data points and this is 
uninteresting in the context of this paper. 

Notice that we are minimizing the Frobenius norm of the residual in Equa- 


tion (II.1). For A € R™*”, this norm is given by 


*=1 9=1 


| A lle= (S52) 


l1 





Consider the case m = 3 and n= 1. The residual in Equation (II.1) is then 


Bo(ti) By (t) pt dj 
Bi(t2) Bi (tz) | dy 
Bo(ts) Bi (ts) d3 


Ba(ti) Bi(ti) 0 0 dy x 

Bo(t2) Bitz) 0 0 bo,« dp. 

Bo(ts) Bits) 0 0 bie | | dae a1.) 
0 0 Bb(ti:) Bi(ts) bo y diy 
0 0 Bi(te) Bite) bry dis, 
0 0 Bo(ts) By(ts) d3.y 


where, for example, bp, denotes the y component of the vector bo. We will call 


Equation (II.2) the uncoupled form of the residual in Equation (II.1). 


B. THE SEPARABLE LEAST SQUARES PROBLEM 
Minimizing Equation (IJ.1) is a nonlinear least squares problem because it 1s 
nonlinear in t. Minimizing Equation (II.1) is also separable because we can separate 
out from the original nonlinear problem a linear least squares parametric functional 
problem for the matrix P. In particular, for a given vector t, the minimizing matrix 


P of the residual in Equation (II.1) satisfies 
P= B*(t)D 


where B*(t) is the Moore-Penrose generalized inverse, or pseudo-inverse, of B(t). 
Note, since we have that P = B*(t)D for a given t that we are able to separate 
the unknown matrix P from Equation (JI.1). Therefore, the optimal vector t is found 


by minimizing the variable projection functional 
|| B(t)B"(t)D — D ||r 
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Further, consider the nonlinear least squares problem 


Bg(ti) Bi(ts) Bz (ts) bs dt 
B(t)P = : bi | =| : (1.3) 
Bo(ts) By(ts) Bo(ta) b3 di 


where the matrix P € R**? and the vector t € R* are unknown and n = 2. Note that 
if the vector t is given, then our least squares problem becomes linear. If, instead, we 
are given a matrix P, then Equation (II.3) is a nonlinear least squares problem for 


the vector t only. 


C. SHORTEST DISTANCE TO A CURVE 


Given an ordered set of data, minimizing Equation (II.1) determines the points 
7 which are nearest to their associated data points for a particular control point 
matrix P. We will refer to these particular points as the nearest points. For example, 
let 7;,, and 7;, represent the z and y values of the point 7; for a particular matrix P. 
We saw that the nodes and control points that minimize Equation (II.1) also minimize 


the 2 norm of Equation (II.2). This is equivalent to minimizing the objective function; 
(Tx ae dix) tele coe 7. 7 oe + (a ae diy) ai (Fe — 7 (11.4) 


For a particular matrix P, a change in the node ¢; only affects the point 
T;. Therefore, Equation (II.4) can be broken down into m independent objective 


functions. For example, the node ¢, only affects the following terms of Equation (II.4) 
(Fis at ee + Gin — d,y)° (11.5) 


Minimizing Equation (II.5) is equivalent to determining the nearest point for data 
point (di. , d;,,). Proceeding like this for each node, we can determine each nearest 
point and minimize Equation (II.1) for a particular matrix P. So we see that with a 
given matrix P the nonlinear least squares problem for the vector t is equivalent to 


solving the nearest point problem. 
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1. A Necessary Condition 
Treating t; as a variable, Equation (JI.5) is minimized by finding the stationary 
point of 
g(t1) = (Taye — die)? + (Ty — diy)’ 
The stationary point is such that 
d 


d d 
dt, ( 1) (71, I; ap 1; - ( 1,y 1) Fp ly 
In matrix notation this becomes 
d d Tix a di» 
one £4 — 0 (11.6) 
Ty diy 


So, we see that the point 7; which minimizes Equation (II.5) is perpendicular to 
the tangent vector at that same point. In general, this is a necessary condition for 
minimizing Equation (II.5). 

Figure (3) on page 18 shows a given ordered set of data and the degree 2 
Bézier curve produced from a given set of control points. Figure (4) on page 18 shows 
the points 7 gotten by solving the nonlinear least squares problem for the vector t. 
Notice that each 7; satisfies the necessary condition except for the point 74. Because 
our parameter values are constrained within [0,1], tg = 1 is the best we can do. 

Now, consider again the problem of minimizing Equation (II.4) except we are 
given an initial set of nodes. Figure (5) on page 19 shows the same data set as used 
in Figure (3) and the node associated with each data point. The nodes were chosen 


uniformly spaced on the interval [0,1] and are given by 


The degree of the Bézier curve we want to fit to the data is again two. After solving 


the linear least squares parameter functional problem for the control points, we can 
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plot the Bézier curve. Figure (6) on page 19 shows the resulting Bézier curve along 
with the points r. Note that point 73 does not minimize Equation (II.5) or satisfy the 
necessary condition. What the least squares solution for the control points produced 
was the best answer possible given the initial set of nodes and desired degree of the 


Bézier curve. 


D. ORDERED DATA 


In the problem statement at the beginning of the chapter, we are given an 
ordered set of m data points. We will order the data with respect to ¢ because Bézier 
curves are parametric in t. So, t; > ¢; implies that the data point associated with t; 
is ordered after the data point associated with ¢;. Therefore, the problem of ordering 
a set of data with respect to ¢ becomes one of determining an initial set of nodes. 

Consider a set of m data points and that we want to determine a set of control 
points which will produce an approximating Bézier curve. We need an initial set of 
nodes in order to solve the least squares problem for the matrix P, and we want the 
initial set of points 7 in the neighborhood of the nearest points. The reason for this 
last condition on the initial set of points 7 is explained in Chapter III. Since the 


elements of t must be contained in [0,1], we might use either the uniform method 





a4 
i=, t= |e okey 
or the chord length method 
|| di — di-1 lle 


t; = t+ ——— ——, i = 2,3,...,m 
Luin? | d; — d;-1 ll2 


where we define ¢,; = 0. 

The main problem with the uniform method is that it does not take into 
account nonuniform distribution of the data. The main problem with the chord 
length method is that it is not invariant under all affine transformations [Ref. 3]. For 
example, consider a set of data and that the user wants to scale one of the components 


by a constant while using the same initial set of nodes in the new scaling. Because 
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the chord length method is not invariant under scaling, the resulting Bézier curves 
would not have the same relationship to the data. 
The affine invariant chord method as described in [Ref. 3] takes into account 


nonuniform spacing of data and is based on the following metric 


t$1 7 4% D = [tina — 2a, Yinr — Yi] A [Zigd — Fi, Yis — yi] 
|D D;| 
| oY _ oxy Pas aie 
= [®inn — 2, Yin — Yi , os (11.7) 
ae a Yi41 — Yi 


where D; and D,4, are successive data points, 0% is the sample variance in the y 
components of the entire data set, 7% is the sample variance in the x components of 
the entire data set, oxy is the sample covariance in the x and y components of the 


entire data set, and 


o= 


ox oy — (oxy) 


The matrix A is the inverse of the covariance matriz used in the bivariate normal 
distribution function. 
The term metric means a way to measure distance. Normally, we think of the 


Euclidean metric 


T 
| Dig — D; |? [Tis — Li, Yit1 — yi] [ [Digs — 23, Yie1 — yi | 


= (i41 — 21) + (Yin — ys)” 


Note that in the case of the Euclidean metric A = J. The reason is that the Euclidean 
metric assumes no scaling or correlation between the data points. 

An improvement on the affine invariant chord method is the affine invariant 
angle method. This method combines the metric in Equation (II.7) and the bending 
of the data. Let three noncollinear data points be the vertices of a triangle. The 


bending of the data is how much of a turn is made when going from one side of the 
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triangle to another at a vertex. For details of the construction of this method see 
[Ref. 4]. The primary algorithm of this paper to solve the problem stated at the 
beginning of the chapter uses the affine invariant angle method to obtain the initial 


set of nodes. 
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Figure 4. Solution for the Nearest Points 
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1 2 3 4 5 


Figure 5. Data and knots for Linear Least Squares Problem 





1 2 3 4 5 


Figure 6. Solution to Linear Least Squares Problem 
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ITl. BASIC ALGORITHM 


This chapter presents the basic algorithm used in this paper to determine 
{P*,t*} and also presents three methods used to solve the nearest point problem. 


Finally, the primary algorithm to determine {P*,t*} is presented. 


A. BASIC ALGORITHM 
The basic algorithm used in this paper to determine {P*,t*} makes use of 
the separability of minimizing Equation (II.1) and follows an alternating projection 


approach. The basic steps are as follows: 


1. Determine the initial set of nodes. 


2. Solve the linear least squares parametric functional problem for the control 
points. 


3. Solve the nonlinear least squares problem for the nearest points. 


4. Repeat steps two and three until the algorithm reaches the stopping criteria. 


The difference between the three MATLAB functions used in researching this paper 


is the method each uses to accomplish step three of the basic algorithm. 


1. Approaching {P*,t*} 
Let us look at the idea behind the basic algorithm as a solution technique for 
determining {P*,t*}. Given an ordered set of data and an initial vector of nodes ty, 


after solving the least squares problem for the matrix P, we would have 
|| B(ti)P, — D ||p= oy (ITT.1) 


Note that, in general, the Bézier curve will not pass exactly through all the data 
points. 
As we saw in Figure (6) on page 19 the initial set of points 7 will most likely 


not be the nearest points, so some improvement to t, is possible. Solving the nearest 
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point problem for tz gives 
|| B(t2)Pi — D |lp=o2< 


Now we have an improved set of nodes but we have not improved the initial 
Bézier curve as described by P,. If there is a better fitting Bézier curve using the 


vector of nodes tz then solving the least squares problem for P» will result in 
| B(t2) Pe — D p= 03 < 02 


In this manner, we approach {P*,t*}. The above argument is not a proof of 


convergence. 


B. THE NEARESTPOINT METHOD 
Given a matrix P, the NearestPoint method is a code found in [Ref. 5] which 
determines the nearest point associated with each data point d; by numerically solving 


for the roots of 


Eh 
dt; 


For example, for a degree 3 Bézier curve, the left-hand side of the above equation 


|B(t:)P — d7] - —B(ti)P = 0 


is a degree 5 polynomial. A subroutine called FindRoots returns the real roots of 
the resulting polynomial which, after ensuring they are within the interval (0, 1], are 
used to determine points on the curve. The distance from the data point d; to each 
determined point on the curve along with the distance from the data point to each 
endpoint of the curve is compared, and the shortest distance indicates the parameter 
value of the nearest point. Because this is a numerical method, there is some degree 
of error in the solution. 

There are two notable differences between this method and the two other 
methods used to accomplish step three of the basic algorithm. First, whereas the 
NearestPoint code considers each nearest point separately, the linear algebra repre- 


sentation of Bézier curves is used by the two other methods to consider all the nearest 
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points at once. Second, note again that the basic algorithm is an alternating pro- 
jection approach where a new ie is produced with each new matrix P. In other 
words, the problem changes each time step two of the basic algorithm is performed. 
50, when accomplishing step three of the basic algorithm the two other methods do 
not continue to iterate to reach a numerical solution to the nearest point problem. 
Instead, the two other methods only move one step in the direction of the nearest 


points. 


C. THE GRADIENT METHOD 


The gradient method resulted from seeing the nonlinear least squares problem 
for the nearest points as an optimization problem requiring a gradient search tech- 
nique [Ref. 6: 220]. From an initial set of nodes, we want to change each value so 
that the points t approach the nearest points. However, instead of holding to the 
formal gradient search method we proceed as follows. 

Recall that in Chapter II we showed that the nearest point on a Bézier curve 


from a given data point d; will minimize 


| 7: — di |r (III.2) 
and satisfy the necessary condition 
d d TI dy x 
| tne dty 2% (III.3) 
Ty = diy 


Therefore, if we have an initial point 7; in the neighborhood of the nearest point, 
then by finding the point 7; which satisfies Equation (III.3) we find the nearest point. 


Recall that the cosine of the angle between two vectors a and b is 


Per ee 


| a {lB || 


Let a= oT; and b = rT; — d;. Given a point 7;, which is in the neighborhood of the 


nearest point, we want to change the node ¢; so that a- b approaches zero. Note that 
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the closer we get to satisfying Equation (III.3) the smaller the magnitude of a-b. 
Therefore, we use a-b to not only tell us what direction to move the node t; but also, 
it was initially thought, to give sufficient indication of how far to move. 

Consider the following example. Figure (7) on page 30 shows four ordered 
data points and a Bézier curve fitted to the data points. Since @ < 5 we have that 
a-b> 0. This tells us to move point 73 to the left, which corresponds to decreasing 


the value t3. We change ts with 
ts = t3 a Ats (III.4) 


where Atg is a scalar multiple of a- b. This iterative process is likened to bracketing 


the nearest point. 


1. Results of the Gradient Method 

The first problem with the method as presented is that Equation (III.3) is not 
a sufficient condition for minimizing Equation (II].2). Starting out with an initial set 
of points 7 in the neighborhood of the nearest points may overcome this problem m 
most cases, but it would still require an added check in any algorithm. 

Secondly, when we fit higher degree Bézier curves to data the nearest point 
problem becomes nonlinear and in these cases this method as presented is insufficient 
to consistently move the points r closer to the nearest points. The problem of de- 
termining the correct step size to take in the direction of the gradient requires more 
effort than relying on a user defined value of some scalar multiplying a- b. Instead of 
attempting a possibly involved search technique to overcome this obstacle, a better 


solution technique for the nearest points was sought. 


D. THE GAUSS-NEWTON METHOD 
For a given matrix P, the Gauss-Newton method is a better way to solve the 


nonlinear least squares problem for the nearest points. 
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1. Review of Newton’s Method 

Under appropriate conditions on the function f : t -> f(t), Newton’s method is 
known to converge quadratically for initial estimates in the neighborhood of the roots. 
The idea behind Newton’s method is that near a root we can model the behavior of 


the function with the following: 


M.(t) = f(te) + f'(te)(t — te) (111.5) 


where M,(t) is a linear approrimation of the function f and t, is an estimate to a root 
of f. The value ¢ such that M,(t) = 0 is an improved approximation. This improved 
approximation is then used as the next estimate. So, we see that Newton’s method 
is iterative with each iteration, or step, bringing the estimate closer to a root of f. 
This is also the idea behind the Gauss-Newton method for solving the nonlinear least 


squares problem. 


2. Solving the Nonlinear Least Squares Problem 
For a detailed discussion on the Gauss-Newton method, see [Ref. 7]. Let 


control point matrix P € R™** be known and that we want to solve the following: 
Bit)P—-D=0 | (III.6) 


where 0 is an mxX2 matrix of zeros. To change Equation (III.6) to a form solvable 


using the Gauss-Newton method, we need to uncouple the left hand side as 


Bo(ti) Betti) --: BR(t) 0 0 oe 0 bon dis 
Bo(t2) Bi (te) --: Bt) 0 0 0 bis dor 
BG (tm) BY (tm) =" Br (tm) 0 0 ae 0 bn dmx 
0 O «. .0 Bo(ti) Bilt) --- Bh) boy | | ary 
0 O s+ 0 Bo(t2) Bf(t2) --- Bit) bi y d2,y 
0 0 “ye 0 BS (tm) BY (tm) —_ By (bmn) bry diny 
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Let the above expression be the residual R(t). We see that R(t) is a system of 2m 
polynomials and, therefore, minimizing R(t) is in some sense similar to root solving. 
The Gauss-Newton method proceeds as follows. As in Newton’s method, we 


first model the behavior of A(t) with 


Mt) = Rte) + | FR(t0)] (t= to 


which is analogous to Equation (JJI.5). We will use the term Jacobian to describe the 


derivative of a vector function. Therefore, our Jacobian matrix is 


adR(ty e}z ee dR(tic)z 
dt; dtm 
dR(tm c)x _ dAR(tmc)x 
= dty ; dtm 
J(t.) = 
@Ritic)y |, ,  AR(tc)y 
dt, dtm 
dR(tme)y ... dR(tme)y 
dt, dtm 
Note that 
dR(t; a 
———— =O for: #7 
dt, Fj 


which also holds for the y components. Therefore, our Jacobian matrix has the 


following special form 


dR 1 2) 
Rtisle 0 0 

0 0 0 

0 0 0 

0 0 dR(tmc)z 

I(t.) = pi (III.7) 

aR(t; e)y 0 0 

dty 

0 0 0 

0 0 0 

aAR(tme 
0 g tndy 
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To get an improved estimate we want to solve for t so that 


M(t) = R(t.) + J(t.)(t — te) 


= 0 


or, equivalently 


I(te)t = I(te)te — R(te) (111.8) 


where the right hand side is a known vector in R2", Note that Equation (III.8) is a 
linear least squares problem for the vector t. Formally, we proceed by forming the 


normal equations 
J7 (t.)J(t.)t = J*(t,)J(t.)t. — J7(t,) R(t.) 


where J7(t.)J(t-.) is a square matrix and invertible. Though one could argue for a 
more numerically stable method to solve this least squares problem, in our particular 
case the normal equations lead to a simple expression for the change to make in 
our nodes and to results which are favorable when using the alternating projection 


approach. So, after multiplying both sides by (J Tte)J (t.)) we have 
T (4 “fat | 
t=t.—(JM(t.)J(t.)) IT (t.) R(t) 
Therefore, our improved estimate t, is given by 


“a 


t. 


te— (J*(te)J(te)) JP (te) R(te)  (IL9) 
t, — A(t.) 


ll 


Our matrix J(t,) from Equation (III.7) has a special form so that the right 
hand side of Equation (III.9) simplifies. First, note that J7(t.)J(t.) is a diagonal 


matrix with elements on the main diagonal of the form 


dR(tic)e\”  (dR(ticly\” - . 
( it. + [eee fori =1,...,m (IIT.10) 
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Therefore, (J7(t,)J(t.))~! is another diagonal matrix with elements which are the 
reciprocals of those described by Equation (III.10). The term J7(t.)R(t,) is a vector 


in 2” with elements of the form 


dR(tie)» dR(ti, 
Rte) io + R(t,ey a (111.11) 


Combining the results from Equation (III.10) and Equation (III.11), we have 
the following expression for each element of A(t,), 


dR(t;).\?  (dR(tiy\?\ dR(t;.) dR(ti.)y 


Equation (III.12) is easy to program into MATLAB and very inexpensive. 
Note that t, is one step in the direction of the nodes which will minimize R(t). That 
is, t- is one step in the direction of the nearest points. 

The effectiveness of the Gauss-Newton method depends in part on our initial 
points + being in the neighborhood of the nearest points. Just like the Newton 
method, a poor initial estimate could cause the method to fail. Therefore, we use the 
affine invariant angle method in our primary solution algorithm to get the initial set 


of nodes. 


3. Stopping Criteria 

Because relative error is a more meaningful indicator of change for larger 
values, the stopping criteria used in this paper to determine {P*,t*} is determined 
by the value of || R(t;41, P41) |l2 When the 2 norm of this residual is greater than 
one, the stopping criteria is the relative error. When the 2 norm of the residual is 


less than or equal to one, the stopping criteria is the absolute error expressed as 
| R(tj41, Pir) — R(tz, P;) [le 


where 


R(t;, P;) = B(t;)P; —D 
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is the residual of iteration 7. Note that the residual depends on the nodes and the 
control points. When either measure of error gets below a user defined value, the 
function stops improving the fit of the curve. 


We could also use the absolute error 
Il tia. — t; lle 


However, a small change in our estimates does not necessarily mean we are close to 
the nearest points. It is the curve we are fitting to the data and not the vector t, so 
it is best to use the curve itself to indicate when we are close enough to determining 


{P*,t*}. 


KE. PRIMARY ALGORITHM 

The following is the primary algorithm this paper uses to determine { P*, t*}. 
It is presented to aid in the understanding of the MATLAB function grad7.m. The 
MATLAB functions included below are provided in the Appendix. 

Step one. The user sends grad7.m the data matrix D E R™*? in the sequence 
he wants the data ordered, the degree Bézier curve to fit to the data, and an exponent - 
value which determines the stopping criteria. The function aff_angle.m is called as 
a subroutine and returns the initial set of nodes t, using the affine invariant angle 
method. The function mzbern2.m is called as a subroutine and returns the Bernstein 
matrix B(t,). Then, the linear least squares parametric functional problem B(t,)P; = 
D is solved for the matrix P, of control points. Finally, the residual R(t,, P,) is 
calculated. | 

Step two. Begin the while loop. The stopping criteria is checked, and, if met, 
we exit the while loop. Otherwise, the derivative of R(t;, P;), which is the derivative 
of B(t;)P;, is determined as presented in Chapter I again using mzbern2.m. We now 
use Equation (III.12) to obtain the improved estimate, where t;,,; = t;. Thus, we 
take only one step in the direction of the nearest points. Once we have the improved 


estimate we ensure its elements are within the interval [0,1]. 
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Step three. Solve the least squares problem for the matrix Pj,;. The residual 


R(ti+1, Pi41) is calculated. Return to step two. End of the while loop. 





O { 2 3 4 5 6 


Figure 7. Moving Nodes with Gradient Method 
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IV. METHOD EXAMPLES 


In this chapter we will consider graphic examples of the different MATLAB 
functions used while researching this paper to determine {P*,t*}. The difference 
between the functions is how each handles the nonlinear least squares problem for 


the nearest points. The MATLAB functions examined in this chapter are: 


1. grad3.m. Uses the gradient method. 
2. grad5.m. Uses the NearestPoint method. 


3. grad7.m. Uses the Gauss-Newton method. 


A. GRAPHICAL COMPARISONS 
This section examines graphical examples of the three MATLAB functions 
above and also the graphical difference between the two affine invariant node spacing 


methods presented in Chapter II. 


1. Gradient Method and Approaching {P*,t*} 

As stated in Chapter III, the gradient method as developed for this paper was 
insufficient to consistently move the points 7 closer to the nearest points. Figure (8) 
and Figure (9) on page 37 show that the algorithm reaches a stage where the points 
T cycle back and forth along the curve. The two curves are similar, but we are no 
longer approaching a better fit. By the third iteration the control points cycle back 


and forth between 





and 


0.8297 
P, = | 6.0947 
1 5.0183 


0.7802 
Py = | 5.9106 
4.9913 


3] 


0.8911 
9.5031 
1.0117 


0.7811 
9.6305 
0.9784 





The nodes cycle back and forth between 


0.0000 
0.1626 
0.3566 
1.0000 


a 


and 
0.0413 


0.1011 
0.4272 
1.0000 
Though either of the curves seem like a pretty good fit, an algorithm that approaches 


{P*,t*} is what any user expects and, therefore, this method is unacceptable. 


2. Total Least Squares Versus Linear Least Squares 

We might ask what the difference is between a fitted curve produced by tra- 
ditional linear least squares and a fitted curve produced by the total least squares 
function grad7.m. Figure (10) and Figure (11) on page 38 are both cubic curves fitted 
to the same data set but using these two different approaches. 

We notice that each curve makes a different assumption about the behavior 
of the data about the point d,. Most important to the user is that the linear least 
squares fit is only assuming error in the y values, while grad7.m is minimizing error 
in the z and the y values. This is more practical since the input, or the z values, will 


also often contain some degree of error. 


3. Effect of the Initial Set of Nodes 

We will now see how the initial set of nodes may determine the fit of the ap- 
proximating curve. Figure (12) and Figure (13) on page 39 show the Bézier curves 
returned by grad7.m using two different affine invariant node spacing methods. Fig- 
ure (12) reflects the affine invariant chord method while Figure (13) reflects the affine 


invariant angle method. Both figures show the nodes associated with the data points. 
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Because the affine invariant angle method takes into account the bending of 
the data, the approximating curve fits more naturally. This is opposed to what we 


have in Figure (12) where it looks like the curve is alternately stretched and slack. 


4. Failure to Maintain Ordering of Data 

Figure (14) and Figure (15) on page 40 are two approximating curves for the 
same set of data. Both curves were generated by grad7.m with stopping criteria of 
10-!. Figure (14) is the result of using the affine invariant chord method to obtain 
the initial set of nodes and Figure (15) is the result of using the affine invariant angle 
method. | 

In the same figures, notice the difference in interpolating about the point dg. 
When the order of the data is not maintained the user gets an entirely different 
picture of the trend in the data. In general, whether the methods presented in this 
paper maintain the ordering of data is dependent on the initial set of nodes and the 
order of the curve the user fits to the data. For example, using the same data set as 
in Figure (15) and a degree 2 curve, the function grad7.m, using the affine invariant 
angle method, will fail to maintain the order of the data about d4. 

Consider the function grad5.m. Because the NearestPoint method is free to 
look anywhere along the curve to solve the nearest point problem, grad5.m more often 
fails to maintain the ordering of data. This is opposed to the Gauss-Newton method 


which assumes that each initial point 7; is in the neighborhood of the nearest point. 


5. Gauss-Newton Method and Data with Noise 

Consider a set of data that lies on a cubic curve and that the data is then 
transmitted to a receiver. In the transmission of the data a small amount of noise 
gets added. How well does grad7.m fit the data with the noise added, and how much 
like the original curve is the resulting fitted curve? Note, in the following examples 
all random sets of numbers were generated using the MATLAB rand command which 


returns a uniformly distributed set of numbers on the interval (0, 1]. Figure (16) on 
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page 41 shows a random sampling of 10 data points from a cubic Bézier curve which 
was generated from a random matrix P. Noise was added to each data point by 
generating a random set of points around the unit circle and scaling each point by 
.012. Figure (17) on the same page is the curve fitted by grad7.m. 

This example shows one of the limitations of grad7.m even when using an 
effective initial node spacing method: grad7.m views the cluster of data points in the 
upper right hand side of the plot as just another set of points to fit with a curve. This 
is why we get the sharp bend within the cluster. The user who wants to avoid this 
type of behavior could, for example, substitute one data point for the entire cluster 
or use a weighted least squares approach to fitting the data. 

Figure (18) on page 42 is the same set of data but now fitted with a degree 5 
curve. This plot reflects what occurs when the fitted curve has too much freedom. A 
degree 5 curve has more freedom than needed for the data, and grad7.m makes use 


of all the freedom available in order to improve the fit. 


6. Gauss-Newton and Real World Data 

This section briefly presents a real world problem where the function grad7.m 
could be used. The data for this section comes from recording positions along a road 
leading to Fort Ord, California with a Global Positioning System (GPS) receiver. 
Many similar data sets are gathered using several different receivers along the same 
route and curves are then fitted to the data. The fitted curves are used to determine 
the bias present based on the particular satellites being used by the receivers. Imagine 
the road in the z-y plane. There will be error in both the xz and y coordinates of each 
location in the data set, so we will want to fit a curve to the data in the total least 
squares sense. Figure (19) on page 43 is the data and the degree 3 curve fitted to the 


data using a stopping criteria of 107‘. 
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B. GAUSS-NEWTON VERSUS NEARESTPOINT 
We now look specifically at the performance of grad7.m and grad5.m. We will 


consider both graphical results and computer time as indicators of performance. Both 


functions get their initial set of nodes using the affine invariant angle method. 


1. Graphical Results 

We will consider cubic Bézier curves on which we select 11 evenly spaced 
points with respect to the parameter ¢ and then add a small amount of random noise, 
as above, to each point. Each of the two functions fits a cubic Bézier curve to the 
resulting data set and the original and newly generated curve are displayed on one plot 
to compare how well each function performs. Figure (20) on page 44 to Figure (25) 
on page 46 are pairs of plots for three different data sets. The solid line is the curve 
returned by the functions. 

Note that in each case the pair of plots is slightly different. Regardless, the 
overall graphical comparison of the two functions, after conducting many other ex- 
amples not shown, is that they perform equally well except when grad5.m fails to 


maintain the ordering of data. 


2. Computer Time and Iterations 

Table I on page 36 reflects how much computer time was used for several sizes 
of data sets and stopping criteria of 107*. Each data set is a matrix of uniformly 
distributed random entries and ordered with respect to the z and y values. Because 
the NearestPoint method finds each new node separately, it is part of a for loop 
within grad5.m. As the data sets get larger, the function slows down in comparison 
to grad7.m where the form of the normal equations in the Gauss-Newton method 


remains relatively fast. 


C. GENERAL COMPARISON OF FUNCTIONS 
Table II on page 36 reflects the general results of researching this paper by 


comparing the three solution functions examined in this chapter. 


39 





Size | Time (sec) [er 
[__[ grad5.m | _grad7m_ 
1000 


Table I. Computer Time Used 






General Results 


Inconsistently indicates how large a step to take in the direction 
of the nearest point. 


Faster than grad7.m in rare cases. 


Graphically, performs equally well as grad7.m. 


Code is already provided but more difficult than grad7.m to 
examine and follow, see [Ref. 5]. 


More readily fails to maintain the ordering of data. 


More expensive to run on larger data sets and many smaller 
data sets as seen during research. 


Faster than grad5.m in most cases. 


In the many examples conducted during research, has not failed 
to reach a reasonable stopping criteria. 


Simple to code and algorithm is easy to follow. 





Table II. General Comparison of Functions 
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Figure 9. Gradient Method Fails to Converge 
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Figure 10. Linear Least Squares Fit 





Figure 11. Gauss-Newton Fit 
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Figure 12. Affine Invariant Chord Method 
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Figure 13. Affine Invariant Angle Method 
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Figure 14. Failure to Maintain Order 
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Figure 15. Affine Invariant Angle Maintains Order 
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Figure 16. Data With Added Noise 


Figure 17. Fitting Data With Added Noise 
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Figure 18. Curve With Too Much Freedom 
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Figure 19. Real World Data and grad7.m 
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Figure 20. Data Set One and grad7.m 
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Figure 21. Data Set One and grad5.m 
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Figure 22. Data Set Two and grad7.m 





Figure 23. Data Set Two and grad5.m 
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Figure 24. Data Set Three and grad7.m 





Figure 25. Data Set Three and gradd.m 
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APPENDIX. MATLAB FUNCTIONS 


GRAD7.M 


function [p,t,info]= grad7(d,deg,stop) 


sf Sk Ss Sf Sf SS St 


ze 


GRAD7.M This function takes a given set of ordered data and returns 
the parameter values (nodes) and control points which determine the 
Bezier curve that fits the data in the total least squares sense. 
Instead of minimizing the vertical distance to the curve we minimize 
both vertical and horizontal distance. The central feature of this 
function is the use of the Gauss-Newton method to estimate the 
nearest point along the Bezier curve from each data point. 


Input Arguments: 


Output Arguments: 


Basic Algorithm: 


info 


(i x 2) matrix of ordered data points, 
1=2,3,... 

degree of the curve the user wants to fit 
to the data 

stopping criteria number which in the 
algorithm becomes 10° (stop) 


control points for best fit Bezier curve 
nodes for best fit Bezier curve 

(2 x 1) vector which has the final norm of 
the residual and the number of iterations 
to convergence 


1. Determine and plot the best fit Bezier curve by 
solving a linear least squares problem using an 
initial set of nodes based on the 
*spread’ and ’bend’ of the data. 


2. Perform the following until the stopping 
criteria is met: 


. Determine a new set of nodes 
which estimates points on the current Bezier 
points. 


Al 


h 
h 





b. Determine the control points for the Bezier 
curve which will best fit the data using the 
new set of nodes. 


ALGORITHM STEP 1. 


Check the hold state so it can be returned to how the user had it. 


if (ishold) 


hold_was_off = 0; 

else 

hold_was_off = 1; 

end 
4 ?i’ is the number of data points and ’j’ is the number of control 


points required for the degree of the curve specified by the user. 
aff_angle(d) returns the initial set of nodes, a vector of ’71?’ 
elements, based on the ’spread’ and ’bend’ of the data. 

= size(d,1); 

= degt1; 


aff_angle(d) ; 


’bez_mat’ is a (i x j) matrix which is determined by the nodes 
and the degree of the curve desired. Since we would 


like to fit the data exactly, we should solve: 
d = bez_mat * p 
for the desired (j x 2) matrix of control points ’p’. This is a 


linear least squares problem since the nodes are known. ’p’ 


4 is determined by: 

A 

y/ p = pinv(bez_mat) * d 

y/ 

% which is the same as using the matlab ’backslash’ command. 
bez_mat = mxbern2(t,deg) ; 

Pp = bez_mat \ d; 


h 


Once we have the control points ’p’, we can determine the Bezier 
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4 curve and the points on the curve associated with the initial vector 
At. ?*y’is the (i x 2) matrix of points on the Bezier curve 

4 associated with the initial vector t. ‘ti’ is a closely spaced 

4 set of parameter values which will produce the Bezier matrix, 

4’ bez_mat_1’, which will give enough points on the fitted Bezier curve 
4 for matlab to plot a smooth looking curve: ’yi’ is the matrix of 

4 these points. 


y = bez_mat * p; 

t1 = [0:1/128:1]’; 
bez_mat_1 = mxbern2(ti,deg); 
yl = bez_mat_1i * p; 


4 Now we plot the results for the user. A legend is provided and the 
4 axes are made ’equal’ to eliminate distortions. 

4 ?Pause’ will keep the plot displayed and delay this function until 
4 the user presses any key on the keyboard. 


figure 

plot(y1¢:,1),y1¢: ,2)) 

hold on 

plot(p(:,1),p(:,2),’*’) 

ploucy Cs Lay lly 2) 

plot(d(:,1),d(:,2),’+’) 

axis(’ equal’) 

legend(’-’,’Fitted Curve’,’*’,’Control Points’,... 
?o’,’Initial Times’ ,’+’,’Data Points’) 

pause 


% ALGORITHM STEP 2. 


% We will perform steps 2a and 2b within a ’while’ loop with stopping 
4 criteria to meet in.order to end the loop. Our stopping criteria 

4 is based on the relative change of the residual. We also initialize 
4 the iteration counter to zero. The ’tic’ command starts a clock so 
4 that the user will know how much computer time was required to 

4 meet the stopping criteria. 


iter = Q; 
resid_old 0; 
resid_new = bez_mat * p - d; 
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tic 


while norm(resid_new - resid_old)/max(1 , norm(resid_new)) > 10°(stop) 


h 


vA 


Algorithm step 2a. Each iteration of the ’while’ loop produces a 
new vector t by solving the nonlinear least squares problem 


B(t) * p=d 


where ’p’ is the vector of control points, ’d’ the matrix of data 
points, and ’B(t)’ is the Bernstein matrix with unknown parameter 
values. Matrix ’B(t)’ is nonlinear in terms of the parameter 

values. The method used in this function is the Gauss-Newton method 
where we let the residual be R(t) = B(t) * p - d and we let the 
Jacobian matrix ’J’ be such that J_i,j = dB(t_i)/dt_j. 1.E., the 
(i,j) “th element of ’J’ is the slope along the Bezier curve at the 
i°th parameter value with respect to the j”°th parameter 

value. The Gauss-Newton method says that the change in parameter 
values which will minimize the residual is given by 


delta_t = -inv(J’ * J) + J’ * R 
where ’J’ and ’R’ are evaluated at the current parameter values. 


To compute the gradient at a point on the Bezier curve, we need the 
forward difference of the control points. I.E., we needa 

(j-1 x 2) matrix where the entries are p_iti - p_i for i=1,..,j-1. 
The slope, ’deriv’, is then determined multiplying the Bernstein 
matrix for a degree-minus-one curve by the forward difference matrix 
of control points and then multiplying by the degree of the original 
Bezier curve. 


deriv = deg * mxbern2(t,deg-1) * (p(2:j,:) - p(1:j-1,:)); 


Now we have what we need for the Gauss-Newton method. Since to use 
this method the residual needs to be a vector, we simply take the 
y-values of the residual and append them to the bottom of the 
x-values. This is done using ’resid(:)’. Similarly, the Jacobian 
matrix’s elements must be for the new vector ’resid’. Each element 
of the matrix ’J’ is the x-value or y-value of the slope at each 
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% parameter’s point. Since the slope at any point on the curve 
4 doesn’t change with any parameter other than it’s own, most of the 
4% entries in ’J’ are zero. 


4 ?t’?, the new nodes are given by ’t = t - delta_t’ 

4 using the formula above. Because (J’ * J) and J’ have a form we 

4, know in advance, we can form ’delta_t’ using less computer time and 
/, flops. 


t =t - (deriv(:,1).*resid_new(:,1) + deriv(:,2).*resid_new(:,2)) ... 
./ [deriv(:,1).°2 + deriv(:,2).°2] ; 


4, Now we have a new vector t, but we want to make sure the values 
% are between 0 and 1. In most cases with well behaved data, the 
% following rescaling of the nodes also results in the endpoints 
4 being associated with the nodes t_1=0 and t_m=1. 


t = -min(t)*ones(i,1i) + t; 

t = t/max(t); 
7, With ordered nodes we now want the new control points so 
4 that we can reproduce the points ’tau’ on the curve for the 


4 next iteration of the ’while’ loop or to be used in the final plot 
%Z if the stopping criteria is met. Note that if the condition is met 
4 we can also use the below ’bez_mat’ matrix for the final plot. We 
4 also compute a new value for ’resid_new’ and update the iteration 
counter. | . 


oS 


bez_mat = mxbern2(t,deg); 
Pp = bez_mat \ d; 


resid_old = resid_new: 
resid_new = bez_mat * p - d; 


iter = iter + 1; 


4 End ’while’ loop and stop computer time clock to show user how long 
4 it took to converge. 


end 


toc 


ol 





4, Now that we have the best fit vector t and the associated 

4 matrix ’p’ of control points, we are ready to plot the final 

4 solution for the user. ’y’ are the points on the Bezier curve 

4 associated with the vector t. ’yi’ are the closely spaced 

4 points on the Bezier curve which matlab will use to plot a smooth 
% looking curve. 


y bez_mat * p; 
yi = bez_mat_l * p; 


74, Finally, we clear the current plot and plot the results. 


figure 
plot(y1(:,1),y1¢:,2)) 
hold on 
plot(p(:,1),p(:,2),’*’) 
plot(y(:,1),y¢:,2),’0o’) 
plot(d(:,1),4dC:.2) 42) 
axis(’ equal’) 


4 Include legend on final plot. 


legend(’-’,’Fitted Curve’,’*’,’Control Points’,... 
19’, ’Nodes’,’+’,’Data Points’) 


% Return hold state to however the user had it. 
if (hold_was_off) 

hold off; 

end 


info = [norm(bez_mat*p-d), iter]; 


% End GRAD7.M 
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MXBERN2.M 


function [B] = mxbern2(t,d) 


% MXBERN2.M This function creates a Bernstein matrix of degree d 

/Z using the values in the column vector t. A Bernstein matrix 

4 is a generalized Vandermonde matrix whose (i,j) entry is 

4 B.{j-i}°d(t_i). The vector t must be a column vector with values 
/Z between 0 and 1. Copyright (c) 3 December 1994 by Carlos F. Borges. 
% All rights reserved. Modified by permission for this paper. 


[n m] = size(t); 
if (m ~= 1) 

error(’t must be a column vector.’); 
end 


4, Check to see if nodes are within [0,1]. 
if min(t) <0 | max(t) > 1 

error(’nodes are not within [0,1]’) 
end 


4 Build the Bernstein matrix. 


ct = i-t; 
B zeros(n,dt1); 


fori=0:d 
BC: ,it1) = (t.7i).*(ct.*(d-i)); 
end 


7, If d > 22 we form the Bernstein matrix differently to 
% avoid roundoff. 


if d < 23 


B = B¥diag( [1 cumprod(d:-1:1)./cumprod(1:d)] ); 
else | 
B = Bediag(diag(flipir (pascal (d+1)))); 

end 


%, End MXBERN2.M 
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AFF_ANGLE.M 


function [h] = aff_angle(X) 


Pa PSs 
a. 
<A 
a 


AFF_ANGLE.M This function returns an affine invariant vector of 
nodes for a given set of ordered data X. It is 

assumed that the user sends this function the data arranged in a 
(n x 2) matrix ordered from row one to row n. 


Obtaining the covariance matrix A is from a function AFF_KNT.M which 
is copyrighted on 3 December 1994 by Carlos F. Borges and appears 
here with his approval. The affine invariant angle method of 
obtaining nodes is found in a paper by Thomas A. Foley 

and Gregory M. Nielson, "Knot Selection for Parametric Spline 
Interpolation", in the book "Mathematical Methods in Computer Aided 
Geometric Design", Academic Press, Inc., 1989. Notation from this 
paper is used to annotate the steps in this algorithn. 


= size(X,1); 
= X - ones(size(X)) * diag(mean(X)); 


Xbar’ * Xbar/n; 
inv (Xcov) ; 


i) 


Obtain the node spacing values using the metric 


t_i = M[X] (X_i,X_(it1)). 


X(2:n,:) - X(i:n-1,:); 


diag(V * A * V’) .* (1/2); 
Obtain the values for 


M*2[X] (X_(i-1) ,XGi+1)) 


NO 
il 


X(3:n,:) - XCi:n-2,:); 


Le) 
il 


diag(V_2 * A * V_2’); 
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4, Get theta_i values. This is what takes into account the bending 
4, of the data. 


theta = zeros(n-1,1): 


for j} = 2:n-1 


theta(j) = min( pi - acos( (t(j-1)°2 + t(j)°2 - ... 
t_2(j-1))/(2*t(j)*t(j-1)) ) , pi/2); 


end 
% Obtain the affine invariant angle node spacing values h_i. 
h = zeros(n-1,1); 


h(1) = t(1)*( 1 + (1.5*theta(2)+#t(2))/(t(1) + t(2)) ); 


for j = 2:n-2 

h(j) = t(j) * C1 + (1.5#theta(j)#t(j-1))/(e(j-1) +t (j)) +... 
(1.5*theta(j+1)*t (j+1))/(t(j)+ tCj+i)) ); 

end 


h(n-1) = t(n-1) * ( 1 + (4.5+#theta(n-1)*t (n-2))/(t(n-2)4t(n-1)) ); 


i Now that we have the node spacing values, we want to normalize 

4 them so that they are within [0,1], with the first data point being 
4 associated with the value zero and the last data point with the 

4 value one. 


h = (0;h]; 
h = cumsum(h); 
h=h / h(n); 


% End AFF_ANGLE.M 


a9) 
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